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ABSTRACT 

New telescopes like the Square Kilometre Array (SKA) will push into a new sensitivity 
regime and expose systematics, such as direction-dependent effects, that could previ¬ 
ously be ignored. Current methods for handling such systematics rely on alternating 
best estimates of instrumental calibration and models of the underlying sky, which can 
lead to inadequate uncertainty estimates and biased results because any correlations 
between parameters are ignored. These deconvolution algorithms produce a single im¬ 
age that is assumed to be a true representation of the sky, when in fact it is just one 
realisation of an infinite ensemble of images compatible with the noise in the data. In 
contrast, here we report a Bayesian formalism that simultaneously infers both system¬ 
atics and science. Our technique, Bayesian Inference for Radio Observations (BIRO), 
determines all parameters directly from the raw data, bypassing image-making en¬ 
tirely, by sampling from the joint posterior probability distribution. This enables it to 
derive both correlations and accurate uncertainties, making use of the flexible software 
MeqTrees to model the sky and telescope simultaneously. We demonstrate BIRO with 
two simulated sets of Westerbork Synthesis Radio Telescope datasets. In the first, we 
perform joint estimates of 103 scientific (flux densities of sources) and instrumental 
(pointing errors, beam width and noise) parameters. In the second example, we perform 
source separation with BIRO. Using the Bayesian evidence, we can accurately select 
between a single point source, two point sources and an extended Gaussian source, 
allowing for ‘super-resolution’ on scales much smaller than the synthesised beam. 

Key words: methods: data analysis - methods: statistical techniques: inteferomet- 
ric. 


1 INTRODUCTION 

The high sensitivity of the SKA (up to 50 times more sen¬ 
sitive than current instruments (Carilli & Rawlings 2004)) 
combined with a relatively cheap antenna design means a 
far more careful and detailed treatment of systematics will 
be required to fully exploit this telescope (Noordam 2000). 
The current approach to this calibration problem iteratively 
applies deconvolution methods such as CLEAN (Hogbom 


1974), alternating with sky and instrumental modelling to 
determine the best-fitting, calibrated image (Pearson & 
Readhead 1984; Kazemi et al. 2011; Kazemi & Yatawatta 
2013; Bhatnagar et al. 2008). This provides only a point es¬ 
timate of the model parameters which will in general differ 
from the true parameters due to random noise (Enfilin et al. 
2014). 

A more rigorous approach is to infer the science and 
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instrumental parameters simultaneously, deriving accurate 
uncertainties and correlations between them. Work in this 
direction includes improvements on the self-calibration algo¬ 
rithm (Pearson & Readhead 1984; Enfilin et al. 2014; Enfilin 
2014; Dorn et al. 2014) and some extensions to the RE¬ 
SOLVE algorithm (Junklewitz et al. 2013; Junklewitz, Bell & 
Enfilin 2014). There has also been considerable effort in this 
direction in producing a maximum posterior image for the 
data and dealing with certain calibration parameters (Sut¬ 
ton & Wandelt 2006; Sutter et al. 2014a,b). These works 
each solve specific aspects of the calibration and deconvo¬ 
lution problem, but so far do not explore the full posterior 
distribution, giving an inaccurate estimation of the uncer¬ 
tainties and correlations, and still rely on producing a single 
image (i.e. a point estimate). 

We propose instead a new technique, called Bayesian 
Inference for Radio Observations (BIRO), which is able to: 
include any source of instrumental uncertainty, such as iono¬ 
spheric effects, pointing errors and primary beam uncertain¬ 
ties, jointly determine the science and instrumental param¬ 
eters and provide reliable estimates of the uncertainties and 
correlations on these parameters, in a holistic and mathe¬ 
matically rigorous manner. 

A simultaneous analysis requires the full posterior prob¬ 
ability distribution of the parameters, which can naturally 
be sampled in the Bayesian formalism by using (for ex¬ 
ample) MCMC (Metropolis et al. 1953; Hastings 1970) or 
nested sampling (Skilling 2004). Our new technique, BIRO, 
fits models including both instrumental and science parame¬ 
ters directly to the raw visibility data. We use the MeqTrees 
(Noordam & Smirnov 2010) software, which implements 
the Radio Interferometry Measurement Equation (RIME) 
(Hamaker, Bregman & Sault 1996), for the modelling of the 
sky and instrumental effects. This technique thus obviates 
the need for intermediate imaging and map-making. The rig¬ 
orous statistical use of all available information allows this 
technique to open new discovery windows, solving previously 
intractable problems, and is applicable to all interferometers 
and problems in radio interferometry. 

This paper is arranged as follows: in section 2 we pro¬ 
vide an introduction to Bayesian statistics and illustrate the 
use of the RIME for modelling in the BIRO algorithm in sec¬ 
tion 3. We then apply BIRO to two key simulated datasets to 
demonstrate its power: In section 4, we jointly fit all scientific 
(source flux densities) and instrumental parameters (point¬ 
ing errors, primary beam parameters and receiver noise) to 
a dataset suffering from direction-dependent instrumental 
effects. In section 5, we focus on the problem of reliably dis¬ 
tinguishing between an extended source, point source and 
a pair of close point sources, for sources on sub-synthesised 
beam scales. We conclude in section 6. 


2 BAYESIAN STATISTICS 

The problem of obtaining the most information possible from 
an incomplete dataset, such as obtained by an interferome¬ 
ter, is perfectly suited to the application of Bayesian statis¬ 
tics. These allow the fitting of arbitrarily complex models to 


data, providing reliable uncertainty estimates for the param¬ 
eters. Bayes’ theorem allows the use of a familiar quantity, 
the likelihood, to answer the question one is really interested 
in: what is the probability of an hypothesis, given the data 
in hand? This probability is known as the posterior and in¬ 
dicates by how much our degree of belief in the hypothesis 
has been updated by the new data. Simple application of 
Bayes’ theorem also allows a robust and intuitive way to 
compare models, which we will require for the second ex¬ 
ample problem in this paper. What follows here is a brief 
overview of Bayesian theory, see Trotta (2008) for a more 
in-depth review. 

From Bayes’ theorem, the probability distribution, 
"P (0|D ,77), of the values of parameters ©, the quantity 
that is actually sought, given the data D that are in-hand 
and a model 77 (hypothesis plus any assumptions), is: 


p(e|D ,H) 


£(D|©,77 ) 77 (©|77) 
Z(D\H) 


(1) 


This is known as the posterior probability distribution. The 
likelihood £(D|©,77), which encodes any constraints im¬ 
posed by observations, is the probability distribution of the 
data given parameter values and a model. 

The prior 17 (©|Jf) includes any prior knowledge of or 
prejudices about the parameter values. Z (D|77) is the inte¬ 
gral of C, (D|©, 77) 77 (©|77) over all ©, not simply normal¬ 
izing the posterior P(0|D,77), but also allowing selection 
of different models by comparing their values quantitatively. 
This so-called evidence, Z (D|T7), automatically includes an 
Occam’s razor effect, penalising models with a large number 
of parameters that are not preferred by the data. By com¬ 
puting the evidence for a range of models we can select the 
best model by maximising the evidence. 

For this work, the likelihood function is 


£(D|©,77) = 


(27T(7 2 ) iV / 2 


exp 


-/V 2 \ 

^(Vi(0)-w) j/2 


( 2 ) 

where Vj (©) are the model visibilities produced by MeqTrees 
(see section 3), with the parameters © as input, V; are the 
data visibilities, N is the number of data points. Here we 
assume the uncertainties on the visibilities are Gaussian and 
have the same value, a, for all datapoints. The best-fitting 
model corresponds to maximum posterior. 

The inferred posterior distributions are full probabil¬ 
ity distributions rather than a summary mean/median value 
and a (perhaps covariant) uncertainty, since this represents 
the total inference about the problem at hand. These distri¬ 
butions may be highly non-Gaussian, making such summary 
parameters inaccurate. 

The application of Bayesian statistics allows one to 
marginalise out the effects of nuisance parameters, which 
are parameters such as the beam shape and pointing errors 
that are not of primary interest, but are unknown and can 
affect the estimates of the parameters of interest (i.e. sci¬ 
ence parameters) because of correlations and degeneracies. 
The marginalised posterior can be written as a function of 
the parameters of interest, 4>, the nuisance parameters, 4', 
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RIME by integrating over the direction cosines, l and m: 


and the data, D: 

P{$\'D,H) = J (3) 

where the integral is performed over the parameter space of 

’E. 

The posterior is, thanks to advances in modern comput¬ 
ing, fairly easily determined using numerical techniques. In 
this paper, we use the Markov Chain Monte Carlo (MCMC) 
Metropolis-Hastings (Metropolis et al. 1953; Hastings 1970) 
algorithm for the joint scientific and instrumental param¬ 
eter inference example. We chose MCMC due to its sim¬ 
plicity and the ease with which it handles large numbers of 
parameters (we have 103 parameters for the first example 
problem). For our second example, that of model selection 
related to source separation and extended structure, we re¬ 
quire efficient calculation of the Bayesian evidence, some¬ 
thing provided naturally by the nested sampling algorithm. 
We utilise the public code MultiNest (Feroz & Hobson 2008; 
Feroz, Hobson & Bridges 2009) to determine both the pa¬ 
rameters and the evidence for model comparison (Jeffreys 
1998; Trotta 2008), but for a smaller set of parameters as 
nested sampling grows rapidly in complexity with increas¬ 
ing number of parameters. 


3 USING THE RIME FOR MODELLING 

Previous Bayesian visibility analyses (Lancaster et al. 2005; 
Feroz et al. 2009; Zwart et al. 2011; AMI Consortium 2012; 
Sutter et al. 2014a) focused on the sky model and were not 
generalised to include arbitrary instrumental effects (or were 
attempting to solve for a much more general sky model re¬ 
sulting in many more parameters, thus needing to fix in¬ 
strumental parameters). The Radio Interferometry Measure¬ 
ment Equation (RIME) (Hamaker, Bregman & Sault 1996; 
Smirnov 201 la, b) provides a powerful framework to easily 
describe exactly what happens to a signal as it travels from 
source to telescope, where it is converted into voltages. The 
RIME is a natural way to model the instrumental and sci¬ 
entific effects that we are inferring through our Bayesian 
technique. For example, the RIME for a single point source 
is given by 

V P9 = J P B Jf. (4) 

where B is the brightness matrix, which describes the sky 
flux distribution, J p is the Jones matrix (Jones 1941) for 
antenna p, containing all instrumental and atmospheric ef¬ 
fects that interfere with the signal, J q is the Jones matrix for 
antenna q, H indicates the Hermitian of a matrix and \Z pq 
are the visibilities, the outputs of the telescope correlator for 
baseline pq. 

The effects that interfere with the signal on its route to 
the output of the telescope can each be described by a Jones 
matrix, with each effect adding a pair of Jones matrices in 
the ‘onion’ form of the RIME: 

V P9 = J pn (. .. (J p2 (J pl B jfOJf 2 ).. (5) 

We can go a few steps further and consider the full-sky 


M pq = G p(J [ EpK p BKf Ef dldm'j G q . (6) 

Here, K p and K q are the Jones matrices describing the ge¬ 
ometric delay between antennas p and q, G p represents the 
direction-independent gains for antenna p, which we set to 
unity for all antennas, and E p is the Jones matrix contain¬ 
ing all the direction-dependent effects for antenna p. We fo¬ 
cus in this paper on the more difficult to handle direction- 
dependent effects, but direction-independent can also be 
handled with our technique. As with all other Jones ma¬ 
trices, E p can be written as a product of Jones matrices, 
each describing a different effect. In section 4, we consider 
both primary beam effects and pointing errors as examples 
of direction-dependent effects each with their own Jones ma¬ 
trix. 

The RIME is implemented in the general, flexible soft¬ 
ware MeqTrees (Noordam & Smirnov 2010; Smirnov & de 
Bruyn 2011; Smirnov 2011c), which allows us to apply it to 
any sky model and for any telescope. MeqTrees has been 
useful for predicting the capabilities of future experiments 
and for understanding the intricacies of current telescopes. 
Here, we go a step further and use MeqTrees as the mod¬ 
elling step in our Bayesian analysis. In order to test BIRO 
and compare it with the standard deconvolution approach, 
we use datasets simulated with MeqTrees over which we have 
complete control and thus would know if we were correctly 
recovering the true input parameters. 

MeqTrees takes from the user a sky model (such as the 
number and distribution of sources, their fluxes, shapes etc.) 
as well as instrumental details (such as the telescope config¬ 
uration, primary beam pattern, pointing errors, noise, at¬ 
mospheric effects, ionospheric effects etc.) and uses the mea¬ 
surement equation to produce realistic simulated visibilities 
that such a telescope would observe. 

In order to test the validity of our technique, we only 
work with simulations in this paper. We use MeqTrees to 
simulate the data and also to model the sky, to test if we re¬ 
cover the input parameters. MeqTrees can be used to model 
any telescope configuration and any sky and instrumental 
effects that can be described with the RIME. While we only 
concentrate on primary beam and pointing error effects in 
section 4, in principle, a wide variety of source types and 
instrumental corruptions can be added in MeqTrees. 

Fig. 1 shows a schematic overview of the BIRO ap¬ 
proach. At each step in the chain of MCMC or Multi- 
Nest, MeqTrees is called with new values for the parameters. 
MeqTrees then returns a visibility set that can be compared 
directly with the simulated data, to determine how well the 
parameters fit. This iterative process allows the determina¬ 
tion of the full posterior for the parameters. We do not as 
yet have a public release of the BIRO code, but plan to in 
the future where we will integrate MontBlanc, a GPU im¬ 
plementation of the RIME (Perkins et al. 2015) with BIRO. 
MontBlanc is already publicly available meaning it can be 
combined with any sampler to allow the user to implement 
BIRO for themselves. 
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Figure 1: The BIRO algorithm. Fixed or initialized inputs 
are shown in yellow, while the sampling loop is represented 
by the pink boxes. Data products are in blue. The main it¬ 
eration loop occurs within either the MCMC or MultiNest 
algorithm (depending on the problem), where new param¬ 
eters are used in each iteration to compute the likelihood. 
The initial parameters are drawn from the prior, which gen¬ 
erally restricts the parameter ranges. In the final step, the 
ensemble of sky realisations can be generated with MeqTrees 
using the parameter samples in the posterior, if required. 


4 EXAMPLE 1: JOINT INFERENCE OF 
SCIENTIFIC AND INSTRUMENTAL 
PARAMETERS 

In this example, we use BIRO to jointly estimate the scien¬ 
tific parameters and nuisance instrumental parameters. Be¬ 
low we describe the model and simulated dataset used, and 
details of the MCMC analysis, and show that the instru¬ 
mental parameters studied are tightly correlated with the 
scientific parameters, a fact that cannot be ignored when 
determining these parameters. 



Figure 2: The simulated, noise-free sky model with 17 
sources with flux densities varying between 0.03 and 3.13 

Jy- 

4.1 Simulated data and parameters of the model 

4-1.1 Telescope configuration 

We use MeqTrees to simulate observations with the Wester- 
bork Synthesis Radio Telescope (WSRT) (Hogbom & Brouw 
1974), a 14-element East-West array with 25m diameter 
dishes. All our WSRT simulations use an integration time of 
30 seconds and a total observation time of 12 hours at a fre¬ 
quency of 1.4 GHz. We use a narrow bandwidth of 125kHz, a 
single channel (for simplicity) and include noise with a stan¬ 
dard deviation of 0.1 Jy/visibility. At this frequency, WSRT 
has a field of view of 0.5-0.6 degrees and a synthesised beam 
width of around 13 arcsec FWHM (full width at half maxi¬ 
mum) 1 . 

4- 1.2 Scientific parameters 

The simulated held consists of 17 unpolarised, point sources 
with known positions. The science goal was to determine the 
flux densities of these sources. We based the simulation on 
an existing held observed by WSRT, consisting of sources 
with a range of huxes (from 0.03 — 3.13 Jy). This is a very 
simple sky model, consisting only of point sources, whereas 
in the second example of the paper, we address modelling 
of extended sources. We do not explore the possibility of ex¬ 
tended sources of arbitrary shapes, as this is out of the scope 
of this paper, but this should be possible using shapelets, 
such as employed in the existing PyBDSM software 2 . The 

1 WSRT Guide to Observations, www.astron.nl/ 
radio-observatory/astronomers/wsrt-guide-observations/ 

5- technical-information/5-technical-informatio 

2 Python Blob Detection and Source Measurement software, www. 
lofar.org/wiki/doku.php?id=public:user_software:pybdsm 
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Right ascension (°) 


Figure 3: The dirty dataset for the model of Fig. 2, as the 
telescope would see it (the colours are histogram-equalised 
to improve contrast). The image is produced directly from 
the visbilities and shows the typical ring structure around 
bright sources that is seen in interferometric data, due to 
the missing angular-scale information in the dataset. The 
rms noise in flux density is about 0.28 mjy. 

brightness matrix in Eq.(6) for an unpolarised point source 
is written as: 

B e° INT = (J °) , (7) 

where I is the intensity. 

Fig. 2 shows an image of the true input model without any 
instrumental effects, while Fig. 3 shows the dirty image of 
the sky. 

4-1.3 Instrumental parameters 

Beam width 

Knowing the primary beam pattern is critical for any astro¬ 
nomical survey. Current practice is to determine the primary 
beam pattern using a technique such as holography (Scott 
& Ryle 1977), then fix a beam model, without propagating 
any uncertainty information into the estimates of the science 
parameters. Since the primary beam directly attenuates the 
flux distribution of the sky, even a small error in the beam 
model can lead to large biases. We thus include beam pa¬ 
rameters in our analysis. WSRT commonly adopts a simple 
model for the primary beam 1 , namely: cos 3 (ci'd), where v 
is the observing frequency (in GHz), 9 is the distance from 
the pointing centre in degrees and c is the beam factor (in 
1/GHz). The beam factor (or beam width) is known to vary 
slightly with frequency. As proof of concept, we assume it is 
unknown, and include it as a further instrumental parame¬ 
ter. One could provide a more complex model for the primary 
beam and easily fit those parameters with this technique as 
well, comparing the models with the Bayesian evidence. The 


model for the beam enters the RIME of Eq.(6) as a direction- 
dependent Jones matrix: 

E beam ( 1, m ) = cos 3 (cv\/l 2 + m 2 ) I, (8) 

where I is the identity matrix. 

Pointing errors 

Pointing errors can substantially corrupt radio observations 
and are known to be a limiting factor in deep observations 
with WSRT (Smirnov & de Bruyn 2011) and other tele¬ 
scopes. The greatest effect is on sources on the flank of the 
primary beam, where the gradient of the beam pattern is 
steep, and a small pointing error produces a larger error in 
apparent flux (compared to the centre of the beam). Since 
the errors can be different from antenna to antenna, this 
produces errors on the observed visibility amplitudes, which 
translates into artefacts in the image. Essentially each source 
is ‘defocussed’ in a complicated way. Thus, we can immedi¬ 
ately suspect there will be a correlation between the pointing 
errors and source flux densities. Two prior approaches to in¬ 
ferring pointing errors directly from the data have hinged 
on maximum-likelihood estimates. These are the pointing 
selfcal algorithm (Bhatnagar, Cornwell & Golap 2004) and 
direct fitting with MeqTrees (Smirnov 2011 3 ). Neither ap¬ 
proach estimates the correlation between pointing errors and 
source parameters, which the Bayesian approach naturally 
provides. We inject time-varying polynomial pointing errors 
for each of the 14 WSRT antennas. We use a second order 
polynomial for each pointing error and fit for the coefficients. 
A polynomial pointing error in each orthogonal direction for 
each antenna results in a total of 84 pointing-error param¬ 
eters. The pointing errors are written as a Jones matrix in 
Eq.(6): 

E PE {l,m) =E beam (1 + Sl p ,m + 5m p ) (9) 

where Sl p and 5m p are the pointing errors in the right as¬ 
cension and declination direction respectively, for antenna p. 
The pointing errors are taken to be time-varying polynomi¬ 
als, written as: 

Sl p = C2t 2 + Cl t + Co, (10) 

and similarly for 5m p , where t is time (rescaled over the 
observation) and Ck are the coefficients we determine with 
MCMC. 

Noise 

The noise on the visibilities is expected to be Gaussian, sta¬ 
tionary and uncorrelated. Noise level can be estimated with 
some precision from the known system temperature, here 
however we show than it can also be inferred accurately di¬ 
rectly from the data. We thus included one final parameter 
for the standard deviation of the noise on the visibilities. 


3 https://indico.skatelescope.org/getFile.py/access? 
contribld=20&sessionld=9&resld=0&materialld=0&confId=171 
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Figure 4: Bayesian factor graph (see section A of the ap¬ 
pendix) of the model for the first simulated dataset. All pa¬ 
rameters we estimate with MCMC are the constants, with¬ 
out any circles around them, coloured blue. The V P5 are the 
observed visibilities, which are drawn from a normal distri¬ 
bution of mean M pq (the unobserved, true visbilities) and 
standard deviation a, which is one of the parameters we es¬ 
timate with MCMC. These ‘true’ visibilities are governed by 
the RIME, which is here simplified graphically to two com¬ 
ponents, the brightness matrix, B, and the Jones’ matrices 
of the antennas, J p , J q . The flux densities of the 17 sources 
are represented by fi, which form components of B. The 
coefficients of the polynomial time-varying pointing errors, 
IjCk and rrijCk (where j represents the antenna number and 
k is the number of polynomial coefficient) enter the Jones 
matrices, along with the beam width, bus. 

4-1-4 Resulting measurement equation 
The RIME for this example problem is thus: 

V p „ = J] ( E mAM (l, + Sl p , m, + + Sl r , m, + 1 5m„)) , 

( 11 ) 

where s runs from 1 to 17 over all the sources. This brings 
the total to 103 parameters: 17 scientific (the flux densities 
of the sources) and 86 instrumental (84 pointing error pa¬ 
rameters, the beam width and the noise). The full model can 
be visualised in the Bayesian factor graph of Fig. 4 and a 
more detailed description of factor graphs is given in section 
A of the appendix. 

4.2 Using MCMC for joint parameter inference 

The initial step of our analysis was to choose an appropriate 
sky model in MeqTrees (specifying the brightness matrix in 
Eq.(6)) and select the telescope configuration corresponding 


to the dataset including all known sources of interference 
and instrumental errors (the Jones matrices in Eq.(6)). We 
vary all the parameters within the model - the flux densi¬ 
ties, pointing errors, beam width and noise - using MCMC. 
Fig. 1 illustrates how the sampling algorithm repeatedly calls 
MeqTrees with new parameter values and evaluates the like¬ 
lihood. MCMC uses the likelihood (Eq.(2)) to determine the 
best-fitting parameter values and to explore the surround¬ 
ing parameter space, thus determining the uncertainties and 
correlations for all parameters. 

4.3 Technical details and priors 

Due to the large volume of the parameter space, we use a 
standard, gradient-based optimisation algorithm to get close 
to the best-fitting parameter values and provide a good start¬ 
ing point for the MCMC. We run several chains in parallel, 
each of around 500, 000 steps, repeatedly computing and di- 
agonalising the covariance matrix to improve convergence, 
and we test convergence using the Gelman-Rubin statistic 
(Gelman & Rubin 1992). The estimated parameters and 
their uncertainties are determined by finding the mean and 
standard deviation (using percentiles) from the marginalised 
one-dimensional posterior for each parameter. For this par¬ 
ticular setup, MeqTrees takes about 0.4s for one likelihood 
calculation, parallelised using 4 cores of 2.2 GHz each. As 
10 chains were run, 40 cores in total were used resulting in 
approximately 55 CPU hours for convergence per dataset. 

We apply a uniform prior to the pointing error param¬ 
eters, restricting them to the broad range of ±200 arcsec- 
onds. We also restrict the beam width to be positive, and 
vary the noise on the visibilities in logarithmic space (with 
an infinitely broad prior in log-space). We do not restrict the 
ranges of the flux densities. 

4.4 Comparison with CLEAN plus source 
extraction 

To compare our technique with the standard approach, we 
apply CLEAN followed by a source-extraction algorithm to 
determine the flux densities of the sources (we call this 
combination CLEAN±SE), without any instrumental cali¬ 
bration. We do not use any calibration algorithms such as 
self-cal, because it would have no benefit: our dataset only 
has direction-dependent instrumental effects, whereas self- 
cal can only correct for direction-independent effects. Cur¬ 
rent approaches to direction-dependent calibration are of no 
help here because: 

(i) Direction-dependent solutions (such as peeling, or dif¬ 
ferential gains) can in principle solve for the variable gains 
induced by pointing error, given a prior source model. How¬ 
ever, this destroys information on the source, since devia¬ 
tions between the true sky and the prior model are com¬ 
pletely absorbed by such gain solutions. 

(ii) Pointing selfcal should in principle improve the 
CLEAN maps and thus produce better source model esti¬ 
mates. However, implementations of this remain unavailable 
to the public. 
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(iii) MeqTrees should in principle be able to do a 
maximum-likelihood solution for the source parameters and 
pointing errors simultaneously. However, only solutions for 
the latter has been demonstrated to work in practice and as 
we have argued, a maximum-likelihood solution produces a 
point estimate for the parameters which may be biased due 
to correlations. 

Instead, we apply a naive CLEAN algorithm, followed by 
source extraction, to compare with BIRO as a worst case 
scenario in the case of time-varying pointing errors. Note 
that we do provide prior information on the positions of the 
sources to CLEAN, in the form of CLEAN boxes. 

We use the CLEAN implementation (specifically the 
Cotton-Schwab algorithm) in the software package CASA 4 
to image the simulated datasets. The images were made with 
robust weighting with a robustness parameter of —1.0. We 
did 1000 iterations of CLEAN with a loop gain of 0.1. Inter¬ 
active cleaning was performed on the visibility data twice, 
once with masks defined around known source positions and 
then with masks defined around only those sources that were 
found during the cleaning procedure. The source extraction 
was performed interactively using PyBDSM to ensure that 
the artefacts were not wrongly identified as sources. 

4.5 Results 

To illustrate fitting a model to the raw data, we plot a subset 
of the visibilities in Fig. 5 with the best-fit visibilities as ob¬ 
tained by BIRO. Fig. 7 (with numerical details in Table 1) 
shows the comparison between the flux densities obtained 
by CLEAN+SE and those by BIRO. The flux densities of 
CLEAN+SE are on average biased due to undealt-with cor¬ 
relations with the pointing errors and underestimated uncer¬ 
tainties. Additionally, because of the time-varying pointing 
errors corrupting the data, CLEAN+SE only manages to 
find 5 of the 17 sources. With polynomial pointing errors 
included in the simulations, bright artefacts dominated the 
final image resulting in the weaker sources being swamped. 
In contrast, because these correlations are taken into ac¬ 
count, the Bayesian approach is able to recover the true flux 
densities for all sources and to determine error bars that 
include the effects of all nuisance parameters. Without the 
instrumental errors, BIRO achieves similar flux estimates to 
CLEAN+SE. 

Fig. 6 shows a subset of the covariance matrix between 
parameters and Fig. 8 shows an example 1 <t and 2 a contour 
plot between pairs of parameters. The key result of Fig. 6 
is that it highlights the significant and complex correlations 
between the pointing errors and flux densities, i.e. the in¬ 
strumental and science parameters, which therefore need to 
be estimated jointly allowing for the correlations. 

The (anti-)correlations between pointing errors and flux 
densities are easy to understand qualitatively. Consider a 
source on the flank of the main lobe of the primary beam, 
e.g. on the half-power point. If a given antenna mispoints 

4 Common Astronomy Software Applications, http://casa. 
nrao.edu/ 



Time (MJD) +4.76919e9 


Figure 5: Example of fitting a model to the raw data. Plotted 
are the real component of the visibilities for a single baseline 
(between antenna 0 and 1) and for the single channel of the 
data, in black. The best fitting model line is overplotted in 
dark blue, with a band of uncertainty of 0.1 Jy (the original 
noise added to the simulation) in pale blue. 

towards the source, the source will be subject to a higher 
primary beam gain, in other words, it will be perceived as 
brighter by all baselines involving that antenna. Mispointing 
away from the source has the opposite effect. The nature of 
the correlation will also strongly depend on the position of 
the source with respect to the pointing centre. For example, 
a source near the centre of the main lobe (i.e. on a ‘flat’ 
part of the primary beam pattern) will correlate very weakly 
with pointing error, while a source on the inner flank of 
the first sidelobe will correlate with mispointing away rather 
than towards. Since different baselines contribute to different 
Fourier mode measurements, pointing error will also have 
a complicated interaction with perceived source structure. 
Similar arguments apply to beamwidth. 

Deriving the exact quantitative nature of this correla¬ 
tion analytically is highly impractical, which is why a tech¬ 
nique like BIRO proves so powerful. This covariance matrix 
could be used to assist in calibration, study calibration pa¬ 
rameters or as input to future MCMC analyses on similar 
datasets. 


5 EXAMPLE 2: MODEL COMPARISON 

In this example problem, we show that BIRO is able, using 
model selection (Jeffreys 1998; Trotta 2008), to choose the 
correct model in each of three different cases, distinguish¬ 
ing between an extended source, an unresolved point source 
and two close (sub-synthesised-beam) sources. The sources 
recovered are all smaller than the synthesised beam. This 
is known as super-resolution and has recently been shown 
to be possible with compressive sensing (Wiaux et al. 2009; 
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Figure 6: Covariance matrix between a subset of parameters illustrating the strong correlations between the science and 
instrumental parameters that must be accounted for to achieve unbiased results. The parameters are listed on each axis with 
the correlations between them represented by a coloured ellipse, either positive (red ellipse angled to right) or negative (blue 
ellipse angled to left). The leading diagonal shows the one-dimensional marginalised posterior for each parameter. For the 
pointing errors, IjCk refers to the fc’th coefficient of the polynomial time-varying pointing error in the right ascension direction 
for the j’th antenna and rrijCh is the same for the declination direction. The flux densities of the 17 sources are given by /;, 
ordered from brightest to faintest, and bw and sigma represent the beam width and noise on the visibilities respectively. 


Li, Cornwell & de Hoog 2011; Carrillo, McEwen & Wiaux 
2012, 2014; Honma et al. 2014) (and to some extent Martf- 
Vidal, Perez-Torres & Lobanov (2012)). Here we use the 
Bayesian evidence to determine the correct model of these 
sub-synthesised-beam sources, with statistical significance. 
Although in this example problem we exclude instrumental 
effects, they can, in general, be included as in example 1. 


5.1 Simulated datasets and models 

The datasets for this example use the same frequency, band¬ 
width, integration time and noise characteristics as the 
dataset simulated in section 4. We simulate three datasets 
with three different sky models with all the sources away 
from the phase centre: a point source, a sub-synthesised- 
beam extended source modelled as a Gaussian and two point 
sources separated by the distance the size of that Gaussian. 
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Name 

RA (°) 

Dec (°) 

True Flux Density (Jy) 

C+SE Flux Density (Jy) 

BIRO Flux Density (Jy) 

FI 

15.216 

68.139 

3.1285 

11.9723 + 0.2240 (39.5cr) 

3.1410l°;° 8 7 f 0 

(0.2(j) 

F2 

15.017 

66.849 

1.6030 

61.4220 ± 1.3557(44.1cr) 

1.5834«;°i 88 

(1.1(7) 

F3 

16.583 

67.662 

0.6719 

0.9345 ± 0.0415 (6.3o-) 

0 6476 +0 ' 0478 

U.U^<U_oo410 

(0.5o-) 

F4 

14.578 

67.815 

0.6170 

0.3942 ±0.0191(U.7o-) 

0.61141“ 

(0.2(7) 

F5 

16.315 

67.853 

0.5648 

0.6579 + 0.0372 (2.5o) 

0.56351“ 

(0.0a) 

F6 

14.943 

68.061 

0.4115 

Not found 

0 4127 - *” 0 ' 0135 
/ -0.0112 

(0.1a) 

F7 

16.075 

68.010 

0.2640 

Not found 

0 2671 + 0 - 0093 

u.z,u< -*-—0.0093 

(0.3a) 

F8 

14.822 

67.676 

0.1293 

Not found 

0-1295l°;“Il 

(0.1a) 

F9 

16.546 

67.396 

0.0919 

Not found 

0.08051“ 

(1-6 a) 

F10 

16.207 

67.138 

0.0742 

Not found 

0 0629 +0- ° 064 
u.uozy_ 0 0064 

(1.8a) 

Fll 

16.001 

67.323 

0.0645 

Not found 

0 O t 59 ( l +0 ' 0029 
u.uuyy_ 0 .oo23 

(1.6a) 

F12 

14.541 

67.175 

0.0524 

Not found 

0 041S" 1 " 0 ’ 0041 

U.U4:±0_o Q031 

(2.7a) 

F13 

16.598 

67.370 

0.0487 

Not found 

0 041 q+ 0 - 0040 

U.U4:lb>_0.0040 

(1.7a) 

F14 

14.186 

67.579 

0.0451 

Not found 

0.0430l°;°“i 

(0.7a) 

F15 

14.496 

67.680 

0.0357 

Not found 

0.03541°;°™ 

(0.2a) 

F16 

14.085 

67.566 

0.0306 

Not found 

0.02891" 

(0.6 a) 

F17 

14.891 

67.188 

0.0301 

Not found 

0-02591°;°™ 

(2.5a) 


Table 1: Comparison between the CLEAN+source extraction results (shortened to C+SE) and the BIRO results for the flux 
densities (in Jy) of the sources in the dataset. The bias in terms of number of standard deviations away from the true flux 
density is given in brackets. For the five sources CLEAN+SE found, the error on the position was less than 10~ 4 degrees. 


No instrumental effects were included in the model-selection 
simulations and the beam width and noise were assumed to 
be known. Fig. 10 shows the input model for all three cases 
in the left column. 

The point sources are parametrized by the Stokes I flux 
density and the position as the distance from the phase cen¬ 
tre, along two mutually perpendicular axes, l and m. The 
extended Gaussian source has three more parameters in the 
form of the projections of the major axis on the l and m axes 
and the ratio of the minor to major axis, defined as: 


l± = e ma j sin(a) 

(12) 

m _l = e ma j cos(a) 

(13) 

r — Cmin/Cmaj, 

(14) 


where e ma j and e m in are the major and minor axes of the 
Gaussian source and a is the position angle (the angle of 
rotation of the extended source). See Fig. 9 for a visual de¬ 
scription. The brightness matrix of Eq.(6) for an extended 
Gaussian is simply the product of a Gaussian and the bright¬ 
ness matrix for a point source. The RIME is simple in this 
example, since there are no instrumental effects apart from 
the usual phase shift between antennas: 

V p? “Z)(// K p ) f(I™)B P s° mT Kf ) H dldm S J , (15) 

where f(l, m) is a Gaussian in l and m for the extended 
source case and / is a delta function for the one and two- 


source models. Also in the one and two-source models, l and 
m reduce to single points l s and m 3 , as in Eq.(ll). 


5.2 Using MultiNest for model selection 

We use MultiNest for calculating the Bayesian evidence (see 
section 2) and MeqTrees for predicting the model visibilities 
from the sampled source parameters from which the like¬ 
lihood is computed iteratively. The likelihood is computed 
according to Eq.(2). The posterior probability distributions 
are obtained as a by-product along with the uncertainties in 
the best-fit parameter values and the Bayesian evidence. 

For the single-point-source model, we vary three param¬ 
eters: the flux density and relative source position, l and 
m. We similarly vary the flux densities and positions of the 
two sources in the two-source model. The Gaussian extended 
source model has six parameters: the flux density, position 
coordinates, and the shape parameters ( l±. m±, r). 

We generate a unique, simulated dataset for each of the 
three cases and then fit each of the three models to them, to 
see if the correct model is selected in each case. MultiNest 
fits for the parameters, their uncertainties and correlations 
(just as MCMC does in example 1), but also returns the 
evidence, Z (D\H) (the probability of the data, given the 
hypothesis). By taking the ratio of evidences, one can deter¬ 
mine whether one model is favoured over another, and by 
how much. The Jeffrey’s scale (Jeffreys 1998; Trotta 2008) 
provides an intuitive way of deciding whether the evidence 
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Figure 7: Estimated vs true flux densities of the sources 
with error bars as estimated by BIRO (blue circles) and 
by a CLEAN+Source Extraction algorithm (red triangles). 
Note that CLEAN+SE only detects 5 out of 17 sources. The 
BIRO error bars are the standard deviation of marginalised 
one-dimensional posterior for each flux parameter. While the 
BIRO results are unbiased, CLEAN+SE has two problems: 
it underestimates the error bars and yields biased estimates 
of the flux densities of up to 44<r. The reader is reminded 
that this dataset contains no direction-independent effects 
that may normally cause biases in a CLEAN analysis; these 
biases are instead due entirely to the complexities in the 
dataset introduced by the time-varying pointing errors. 

is strong enough to select a model, based on odds derived 
directly from the evidence. 


5.3 Technical details and priors 

We use uniform priors for all the source parameters. The 
flux density is restricted to the range 0 to 2 Jy. The position 
parameters are allowed to be both positive and negative in 
the range -25” to 25” since the position is measured relative 
to the phase centre. For the shape parameters of the ex¬ 
tended source, (l± and m±), we allow the prior ranges to be 
big enough to encompass the point-spread-function (PSF) of 
the interferometer and no more, since we are dealing with 
sub-synthesised-beam sources. This translates to a range of 
0” to 20” for l _l and -20” to 20” for m±. Finally, we restrict 
the minor-to-major axis ratio (r) to be positive, but less than 
unity to be physically meaningful. We found that using 1000 
live points achieved good results from MultiNest. 



flux i 7 



Figure 8: Credible interval contour plots between a subset of 
parameters. The la and 2cr probability densities are shown 
in dark and light colours respectively. The true (input) pa¬ 
rameters are marked with a black star. The pairs of parame¬ 
ters are: Upper left: The two highest order coefficients of the 
pointing error in the right ascension direction for antenna 
9. Upper right: Flux densities of two of the sources. Lower 
left: The flux density of the 17th source vs the beam width. 
Lower right: The constant term from the polynomial point¬ 
ing error in the right ascension direction for antenna 10 vs 
the flux density of the 15th source. 



Figure 9: The parameterisation of a Gaussian extended 
source in MeqTrees. Here, e ma j and e m i n are the major and 
minor axes of the Gaussian and a is the position angle. 
MeqTrees uses l±, m± and r = e m in/e m aj in its parame¬ 
terisation of a Gaussian. 


5.4 Results 

The relative logarithmic evidences are computed for each 
model giving the relative confidence with which one model 
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Hypothesis 

Simulation 

A 

B 

c 

A 

1 : 1 

10 s93 : 1 

10 72O ° : 1 

B 

10 993 : 1 

1 : 1 

10 5079 : 1 

C 

62 : 1 

857 : 1 

1 : 1 


Table 2: Relative evidences for each model in each simu¬ 
lated dataset. A is the two-source model, B is the extended 
source model and C is the one-source model. The evidences 
are relative to the model used to generate the dataset (so, 
for example, for the two-point-source dataset, the evidence 
for each model is compared to the two point source model). 
The maximum error in log-evidence is 1.5. High odds indi¬ 
cate the input model is favoured (as it is in all three cases), 
showing that nested sampling selects the correct model at 
high significance (at a SNR of 1000). 

is preferred over another (see Table 2). We find that the cor¬ 
rect hypothesis is selected in all cases, at odds of 10 593 :1, 
10 993 :1 and 62:1, for the two-point-source, extended-source 
and single-point-source models respectively. Using model se¬ 
lection, BIRO is able to select the correct model in all three 
cases (the model with the highest evidence), showing it 
can perform source separation even on sub-synthesised-beam 
scales. 

We computed a ‘best-fitting’ image by running 
MeqTrees with the maximum posterior model and param¬ 
eters in each of the three cases, to compare with the 
CLEANed image (see Fig. 10). We use the same CLEAN 
parameters as in section 4.4. The CLEANed images (at least 
in this case, without an enforced smaller beam size) are un¬ 
able to reach the sub-synthesised beam scales achievable by 
BIRO. 

In Fig. 11, we determine the point at which model se¬ 
lection fails to distinguish an extended source from a point 
source for different source sizes and signal-to-noise ratios 
(SNRs). Any evidence lower than ‘strong’ is not usually con¬ 
sidered high enough to say either way which model is cor¬ 
rect. Perhaps obviously, at high SNR extremely small sources 
can be detected (around 1.0 arcseconds) and sources become 
more difficult to distinguish as the SNR is reduced. 

Video 1 in the online-only content shows visually how 
MultiNest converges to the correct model, exploring the pos¬ 
terior as it goes, for the extended source model. Each frame 
is an image generated using the parameters from every 40 th 
step of the chain. 


6 DISCUSSION AND CONCLUSIONS 

We have introduced the technique Bayesian Inference for Ra¬ 
dio Observations (BIRO), a Bayesian approach to the decon¬ 
volution problem of radio interferometry. Instead of making 
an image and then performing source extraction, BIRO uses 
MCMC or nested sampling to fit models directly to the vis- 


o 



Figure 10: Left column: The true sky for the extended Gaus¬ 
sian, single point source and two point sources models (from 
top to bottom). Middle column: The CLEANed image for 
the three models. Right column: The maximum posterior 
BIRO image for the three models. The purple contour in 
each image indicates the size of the synthesised beam, as 
returned by CLEAN (note that the sources are all much 
smaller than the synthesised beam). BIRO recovers the cor¬ 
rect input model each time while CLEAN is unable to dis¬ 
tinguish between the models at the same SNR (in this case 
the SNR was 1000). 


ibility data and obtain the posterior for the parameters of 
interest, as well as nuisance parameters. 

In the first example problem, we focused on the rela¬ 
tionship between scientific and instrumental parameters. It 
was found that all parameter estimates from BIRO were con¬ 
sistent within their error bars with the true values. As well 
as determining the uncertainties of the parameters, BIRO 
also returns the covariance matrix between them, as a by¬ 
product of the full posterior. Our work shows these correla¬ 
tions are complicated and non-negligible. BIRO effortlessly 
incorporates the effects of the correlations in the estimates 
of the marginalised uncertainties on the individual parame¬ 
ters, as well as providing a way to study these correlations in 
the form of the covariance matrix. We compared our results 
to a standard CLEAN algorithm, without calibration (since 
our simulated data contains only direction-dependent effects 
and publicly available calibration algorithms only deal with 
direction-independent effects). Because of the time-varying 
pointing errors we introduce to the dataset, CLEAN is only 
able to find 5 out of the 17 sources and returns biased flux 
densities for them, while BIRO returns unbiased flux densi¬ 
ties for all sources. BIRO is also able to correctly determine 
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Gaussian source size (arcseconds) 


Figure 11: Relative natural log-evidence (i.e. the natural log¬ 
arithm of the ratio of the Bayesian evidence for the true 
model to that of a single point source) as a function of Gaus¬ 
sian source size, for the extended source input model, show¬ 
ing the evidence-crossover points for different source sizes 
and signal-to-noise ratios (peak flux to background noise). 
The horizontal axis gives the size of the circular Gaussian 
source in the input model (the reader is reminded that the 
FWHM of the synthesised beam is around 13 arcseconds). 
The vertical axis gives the odds in favour of the Gaussian 
source model when model comparison is performed for the 
Gaussian model against a point-source model. The more pos¬ 
itive the relative log-evidence is, the more strongly is the 
Gaussian model favoured. Each curve on the graph is for 
a different noise level with the approximate (map) SNRs 
shown in the legend. 

the coefficients of the time-varying pointing errors, the pri¬ 
mary beam width and the noise on the visibilities. 

In the second example problem, we addressed the issue 
of how to determine the best sky model for the data. We 
worked with three models: a single point source, a Gaussian 
extended source and two point sources. We simulated data 
for each of the three models and then, for each dataset, ran 
MultiNest to fit each of the three models and determine 
the Bayesian evidence. The evidence then determines the 
selection of the correct model. All of the sources detected 
were several times smaller than the synthesised beam, hence 
we successfully achieved super-resolution as well as source- 
separation. 

This paper constitutes a proof of concept but more work 
is required before the technique can be easily applied to in¬ 
terferometric dataset: 

(i) Firstly, while using a WSRT simulation has relevance 
to the SKA due to the similar instrumental setup, the SKA 
will have many more antennas (on the order of a thou¬ 
sand) which will of course result in many more instrumental 
parameters (and indirectly more science parameters as the 
source count increases with sensitivity). Fortunately, while 
the number of instrumental parameters scales as the number 
of antennas, N, the number of datapoints scales as the num¬ 
ber of baselines, i.e. 0(N 2 ), meaning it is plausible that one 
could simultaneously determine the sky and instrumental 
parameters for large N. While the precedent for sampling an 
extremely large parameter space exists (Jasche & Wandelt 
2012), new and sophisticated sampling techniques (Duane 


et al. 1987; Neal 2012; Goodman & Weare 2010; Foreman- 
Mackey et al. 2013) (which are also easily parallelised) will be 
required to improve convergence in the thousand-parameter 
regime, especially as the non-linear nature of the modelling 
makes sampling inefficient (as addressed in Jasche & Wan¬ 
delt (2012)). 

(ii) Secondly, the Bayesian approach is far more computa¬ 
tionally intensive than standard deconvolution, taking hours 
(55 CPU hours in the case of example 1) to converge to the 
correct posterior distribution. The complexity of the likeli¬ 
hood computation scales as the number of antennas squared 
(i.e. the number of baselines), making an SKA-like compu¬ 
tation difficult with the current setup. However, the RIME 
is intrinsically highly parallelisable allowing an efficient im¬ 
plementation of MeqTrees on GPUs. Preliminary work on a 
GPU implementation indicates a speed-up of the likelihood 
computation of about 250 times (Perkins et al. 2015). This 
means this technique can be applied to data from existing 
telescopes such as ALMA (Hills & Beasley 2008) and LO- 
FAR (van Haarlem M. P. et al. 2013), using current computer 
clusters. 

(iii) Thirdly, we need to address the problem of not 
knowing the sky model beforehand, which is a common 
difficulty when dealing with calibration but is particularly 
important here, as a Bayesian analysis relies on a good 
model. There are a number of ways to tackle this issue which 
we hope to address in future publications. A simple, but 
computationally-intensive, solution would be to run several 
different models (with increasing numbers of sources) and 
select between them using the Bayesian evidence. Another 
possible approach is to use a deconvolution algorithm, 
like CLEAN or RESOLVE, to get an initial set of sources 
and then iterate between deconvolution and the best fit of 
BIRO to get a subsequently better model. A more rigorous 
solution would be to use an algorithm like birth-death 
(Stephens 2000) or reversible jump (Green 1995) MCMC, 
which is able to determine both the number of parameters 
required and the posterior for them simultaneously. A 
further possibility is to combine the more general approach 
proposed in Sutter et al. (2014b) and Junklewitz et al. 
(2013), that divides up the held into many ‘pixels’ that 
are then allowed to vary, with the calibration capabilities 
of BIRO to produce estimates of the sky model. This is 
even more computationally challenging however but would 
provide a more general and robust solution. 


BIRO is not only useful for dealing with systematics, which 
will become more important as telescopes become more sen¬ 
sitive, but it is also a powerful technique for lending statisti¬ 
cal strength to topical scientific questions. Potential applica¬ 
tions include: structures of black hole systems, jet emission 
in active galaxies, time variability of objects and radio weak 
lensing. BIRO allows a holistic way to include instrumental 
effects while at the same time returning the science we are 
interested in. By leveraging the power of Bayesian statistics, 
BIRO uses all information available to get the most out of 
interferometric datasets. 
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Video 1: Online only (also available at https://vimeo.com/ 
117391380). Images generated from the MultiNest chain for 
the extended Gaussian source dataset and model. At every 
40th step in the chain, that step’s parameters were used to 
generate an image of the field. The parameters are at first 
quite variable but soon converge to the correct shape, posi¬ 
tion and flux density for the source. The sample probability, 
which is the normalised posterior for that point, improves as 
the chain converges to the correct parameter values. 
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APPENDIX A: BAYESIAN FACTOR GRAPHS 

Here we introduce Bayesian factor graphs, useful tools for visu¬ 
alising Bayesian models, which we use to describe the model of 
section 4. We make use of the directed factor graph notation, de¬ 
veloped in Dietz (2010), to visualise how the parameters in our 
models depend on one another. Table Al defines the graphical 
primitives of a factor graph. Figure Al demonstrates the use of 
the factor graph notation in a simple example. 



Figure Al: A simple example factor graph. In this model, 
the data are represented by a vector Xj, which we suspect is 
normally distributed. This is modelled by a normal distribu¬ 
tion (represented by the factor labeled A/*) which is governed 
by the parameters n and a. These constants would be the 
parameters we would want to estimate with an MCMC or 
MultiNest analysis. 
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Type 


Latent variable 


Observed variable 


Deterministic 


Constant 


Description 

An unobserved, undetermined variable 

The data 

A variable completely determined by other variables 
A “true” variable. We usually want to determine these 


Output 

■O' 

© 

© 


N 


Factor 


A distribution to be drawn from 


Plate 


Repeats variables (all the sources in the sky model for example) 



m € M 


Table Al: Factor graph node types (adapted from Dietz (2010)). The concept of a plate is worth an extra mention. Frequently 
in models variables are repeated, such as the 17 flux densities or 14 sets of pointing errors in our model in example 1. A plate 
in a factor graph allows one to easily show these variables are repeated, but each can have a unique value. So in the case of 
the source flux densities, m would range from 1 to 17, the value of M. 
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